Hopefield Networks

Hopfield networks were introduced in 1982 by John Hopfield and they represent the return of Neural Networks to the Artificial Intelligence field. I will briefly explore its continuous version as a mean to understand Boltzmann Machines.

The Hopfield nets are mainly used as associative memories and for solving optimization problems. The associative memory links concepts by association, for example when you hear or see an image of the Eiffel Tower you might recall that it is in Paris.

The inputs and outputs of the Hopfield network are binary, so we can train it to remember B&W images such as a collection of letters. For instance, we can teach them the next set of letters:

</tr> </tbody> </table>

Fig. 1. A set of 4 letters to be stored in a Hopfield Network. Based on MacKay.

So how does the Hopfield network recalls a pattern? Well, the recalling process is iterative and in each cycle all the neurons in the net are fired. This process ends when the state of the neurons doesn't change, as seen in the next figure.

(a) Input pattern

(b) Iteration 1

(d) Iteration 3

(e) Final iteration

Fig. 2 . Recalling process of an input pattern.

We shall setup the notebook with the required libraries:



In [1]:

    
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
import matplotlib.patches as mpatches

A simple representation for each letter is an array, so the patterns can be stored in a matrix:



In [2]:

    
patterns = np.array([
   [-1,-1,-1,-1,1,1,-1,1,1,-1,1,-1,1,1,-1,1,-1,1,1,-1,1,-1,-1,-1,1.],   # Letter D
   [-1,-1,-1,-1,-1,1,1,1,-1,1,1,1,1,-1,1,-1,1,1,-1,1,-1,-1,-1,1,1.],    # Letter J
   [1,-1,-1,-1,-1,-1,1,1,1,1,-1,1,1,1,1,-1,1,1,1,1,1,-1,-1,-1,-1.],     # Letter C
   [-1,1,1,1,-1,-1,-1,1,-1,-1,-1,1,-1,1,-1,-1,1,1,1,-1,-1,1,1,1,-1.],]) # Letter M

And we can plot any letter with the following function:



In [3]:

    
def plot_letter(letter):  
    black = '#000000'; white='#FFFFFF'; gray='#AAAAAA'
    squareSide = 1
    
    fig = plt.figure(figsize=(0.35,0.35))
    ax = plt.axes([0,0,6.5,6.5])
    
    ax.set_ylim([0,5])
    ax.set_xlim([0,5])
    plt.gca().invert_yaxis()

    # Plotting squares
    for i in xrange(25):
        coords = np.array([i%5*squareSide, (i/5)*squareSide])
        color = black if letter[i]< 0 else white
        square = mpatches.Rectangle(coords, squareSide, squareSide, 
                                                      color = color, linewidth=0.5)
        ax.add_patch(square)
    
    # Plotting grid
    for i in xrange(6):
        x = (5-i%5)*squareSide
        s1, s2 = np.array([[x, x], [0, 5*squareSide]])
        ax.add_line(mlines.Line2D(s1, s2, lw=1, color = gray ))
        ax.add_line(mlines.Line2D(s2, s1, lw=1, color = gray ))

Architecture

A Hopfield net is a set of neurons that are:

Bidirectionally connected between each other with symmetric weights, i.e. the weights between all neurons $i$ and $j$ are $w_{ij}=w_{ji}$.
Not self-connected, this means that $w_{ii}=0$.

Both properties are illustrated in Fig. 3, where a Hopfield network consisting of 5 neurons is shown.

Fig. 3. Hopfield network architecture.

One property that the diagram fails to capture it is the recurrency of the network. The Hopfield networks are recurrent because the inputs of each neuron are the outputs of the others, i.e. it posses feedback loops as seen in Fig. 4. This last property is better understood by the recalling process.

Fig. 4. A Hopfield network consisting of 5 neurons with feedback loops. Based on Haykin.

Learning Rule

In order to learn a set of patterns $\textbf{x}^{(n)}$, such as the four letters presented above, the weights of the net must be set using the Hebb Rule:

$$w_{ij}=\eta\sum_n{x_{i}^{(n)}x_{j}^{(n)}}$$

where $\eta$ is a constant that might be set to $\eta=1/N$ for $N$ patterns. In the case of the continuous Hopfield network, this constant prevents the weight from growing with $N$. The idea of the Hebbian rule is that two neurons that are repeatedly fired at the same time must have a stronger association.

So the learning function is as follows:



In [4]:

    
def train(patterns):
    n, m = patterns.shape
    eta = 1./n

    weights = np.zeros((m,m))
    for i in xrange(m-1):
        for j in xrange(i+1,m):
            weights[i,j] = eta*np.dot(patterns[:,i], patterns[:,j])
            weights[j,i] = weights[i,j] 
            
    return weights

Activity Rule

The activation $a_{i}$ of the neuron $i$ is given by

$$a_{i}=\eta\sum_j{w_{ij}x_{j}}$$

and the update of the state $x_{i}$ for the continuous case is given by

$$x_{i}=\tanh{(a_{i})}$$

The order of the neuron updates can be performed synchronously or asynchronously. In the former kind the activation and update of all neurons are done at the same time. In the latter kind the neurons are activated and updates sequentially and its order is randomly permuted beforehand.

The recall function can be stated as:



In [5]:

    
def recall(weights, states, maxNumItr):
    EPS = np.finfo(np.float).eps
    
    activations = np.zeros(states.shape)
    states_are_unstable = True; itr=0
    while states_are_unstable and itr<maxNumItr:
        prevStates = states.copy()
        for i in np.random.permutation(states.size): # asynchronous activation
            activations[i] = np.dot(weights[i,:], states)
            states[i]=np.tanh(activations[i])
        
        states_are_unstable = not np.allclose(states, prevStates, EPS)
        itr=itr+1
        
    return states

Learning Rule

In order to learn a set of patterns $\textbf{x}^{(n)}$, such as the four letters presented above, the weights of the net must be set using the Hebb Rule:

$$w_{ij}=\eta\sum_n{x_{i}^{(n)}x_{j}^{(n)}}$$

Activity Rule

The activation $a_{i}$ of the neuron $i$ is given by

$$a_{i}=\eta\sum_j{w_{ij}x_{j}}$$

and the update of the state $x_{i}$ for the continuous case is given by

$$x_{i}=\tanh{(a_{i})}$$

Convergence

After seeing the recalling process example one might be wondering many interesting questions like does the Hopfield network will always stabilize? The answer is yes, but the net might not recall the correct pattern. This could happen if the net is overloaded or the weights were altered.

The convergence to a stable state is guaranteed for networks that have the two main characteristics of its architecture and that follows an asynchronous update procedure.

Please note that in this post I didn't provide any mathematical proof for the convergence neither I talked about attractors, nonlinear dynamical systems, and Lyapunov functions. Maybe I will talk about them in a future post, in the meantime please see the references section.

Example

Now lets code the example of Fig. 2, first we have to train the weights with the memory patterns:



In [6]:

    
weights = train(patterns)

Lets define and plot the example input pattern:



In [7]:

    
states = np.array([1,1,1,1,-1,1,-1,1,1,-1,1,1,-1,1,1,1,1,1,1,-1,-1,1,1,1,1.])
plot_letter(states)

Finally, lets just randomly recall the input pattern for four iterations, please note that it approximately takes twenty five iterations to stabilize numerically:



In [8]:

    
np.random.seed(10)

for i in range(4):
    states = recall(weights, states, 1)
    plot_letter(states)

References

I heavily used the book Information Theory, Inference and Learning Algorithms by David MacKay for writing this post. The example presented in this post is discussed in depth in the same book. Another good source for understanding the training and update procedures is presented in this page. By the way, David MacKay is one of Hopfield's PhD alumni.

I also found useful these books Neural Networks and Learning Machines, Bayesian Reasoning and Machine Learning, and Neural Networks - A Systematic Introduction. Another resources on the web are Wikipedia and Scholarpedia.